Skip to content

feat(runtime): add standalone edge runtime for Pi/Jetson deployment#799

Merged
mhamann merged 53 commits intomainfrom
feat-runtime-edge-standalone
Mar 30, 2026
Merged

feat(runtime): add standalone edge runtime for Pi/Jetson deployment#799
mhamann merged 53 commits intomainfrom
feat-runtime-edge-standalone

Conversation

@rachmlenig
Copy link
Copy Markdown
Contributor

Break the edge runtime's dependency on runtimes/universal/ by copying and trimming the needed files into runtimes/edge/. The edge runtime is now fully self-contained with its own pyproject.toml, Dockerfile, and zero imports from universal.

Key changes from universal:

  • models/init.py exports only 4 model types (was 12)
  • vision router includes only detection/classification/streaming (no training, evaluation, tracking, OCR, or document extraction)
  • chat_completions/service.py makes heavy utils optional (context summarizer, history compressor, tool calling, thinking)
  • file_handler.py rewritten without PyMuPDF (no PDF processing)
  • context_calculator.py makes torch import lazy for GGUF-only deploys

Break the edge runtime's dependency on runtimes/universal/ by copying
and trimming the needed files into runtimes/edge/. The edge runtime
is now fully self-contained with its own pyproject.toml, Dockerfile,
and zero imports from universal.

Key changes from universal:
- models/__init__.py exports only 4 model types (was 12)
- vision router includes only detection/classification/streaming (no
  training, evaluation, tracking, OCR, or document extraction)
- chat_completions/service.py makes heavy utils optional (context
  summarizer, history compressor, tool calling, thinking)
- file_handler.py rewritten without PyMuPDF (no PDF processing)
- context_calculator.py makes torch import lazy for GGUF-only deploys
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 12, 2026

All E2E Tests Passed!

Test Results by Platform

OS Mode Status
ubuntu-latest source ✅ Passed
ubuntu-latest binary ✅ Passed
macos-latest source ✅ Passed
macos-latest binary ✅ Passed
windows-latest source ✅ Passed
windows-latest binary ✅ Passed

Summary


This comment was automatically generated by the E2E Tests workflow.

Move 4 identical utility modules from both runtimes into
llamafarm_common so bugs fixed in one place apply everywhere:

- utils/safe_home.py → llamafarm_common/safe_home.py
- utils/device.py → llamafarm_common/device.py
- utils/model_cache.py → llamafarm_common/model_cache.py
- utils/model_format.py → llamafarm_common/model_format.py

Both runtimes now have thin re-export shims that import from
llamafarm_common, so all internal `from utils.X import Y`
statements continue to work unchanged.

Also:
- Add cachetools dep to llamafarm_common (needed by model_cache)
- Consolidate pidfile.py to use safe_home instead of duplicating
  home directory resolution logic
- Fix model_format.py internal import to use relative import
  within the common package

Removes ~1,100 lines of duplicated code across the two runtimes.

Candidates identified but not moved (would add heavy deps to common):
- core/logging.py (needs structlog)
- services/error_handler.py (needs fastapi)
- models/base.py, vision_base.py (architectural scope change)
- All router files (deeply coupled to FastAPI app structure)
Add HailoYOLOModel that runs YOLO inference on the Hailo-10H AI
accelerator using pre-compiled .hef models from the Hailo Model Zoo.

The server auto-detects Hailo hardware at startup:
- Checks for hailo_platform package
- Checks for PCI device ID 1e60 (Hailo-10H) via lspci
- Falls back to /dev/hailo0 device node
- Falls back to CPU/ultralytics if Hailo not available

New file: models/hailo_model.py
- HailoYOLOModel with same interface as YOLOModel
- Letterbox preprocessing for aspect-ratio-preserving resize
- NMS output parsing (Hailo .hef models include built-in NMS)
- COCO 80-class label mapping
- Configurable .hef directory via HAILO_HEF_DIR env var

Server changes:
- load_detection_model() selects backend based on hardware detection
- FORCE_CPU_VISION=1 env var to skip Hailo and force CPU
- hailo_platform import is fully optional (try/except)
Build from repo root with -f flag so COPY can reach common/ and
packages/. Use --no-sources to skip [tool.uv.sources] relative
paths that don't apply inside the container.

Usage: docker build -t edge-runtime -f runtimes/edge/Dockerfile .
Two bugs prevented llama.cpp from loading on ARM64 Linux (Pi/Jetson):

1. Version mismatch: _get_llamafarm_release_version() read the
   llamafarm-llama package version (0.1.0) but the ARM64 binary is
   published under the main monorepo release tag (v0.0.28). These
   versions are decoupled. Now queries GitHub API for the latest
   release, with LLAMAFARM_RELEASE_VERSION env var override and
   v0.0.28 hardcoded fallback.

2. Extension mismatch: manifest template used .tar.gz but the
   actual published asset is .zip. Fixed to match.
The pre-built llama.cpp ARM64 binary requires GLIBC 2.38+ but
python:3.12-slim-bookworm only has GLIBC 2.36. Switch to
ubuntu:24.04 (GLIBC 2.39) and install Python via apt.

Also add --break-system-packages to uv pip install since Ubuntu
24.04 marks system Python as externally managed (PEP 668). This
is safe inside a container.
Address CodeQL review comments:
- Remove `from module import *` from all 8 re-export shims (edge +
  universal). The explicit imports already cover everything needed.
- Remove unused `get_file_images` import from edge server.py
Dockerfile:
- Install vision extra (ultralytics, transformers) and pi-heif
- Add system libs for OpenCV (libgl1, libglib2.0-0, libxcb1)
- Set YOLO_AUTOINSTALL=false to prevent runtime pip installs

Vision hardening:
- Strip whitespace from base64 input before decoding (fixes
  newlines from curl piping with jq/base64 tools)
- Wrap PIL Image.open() with proper error handling in vision_base,
  detect_classify — returns clear error instead of raw traceback
- Pre-register HEIF plugin in yolo_model.py to prevent import
  errors on some ultralytics builds
…edge

- Re-export HfApi and _check_local_cache_for_model from the
  universal model_format shim so tests that mock
  utils.model_format.HfApi continue to work
- Add project.json for edge runtime so Nx can process the project
  graph without failing on the unnamed directory
…/rag

The model_format tests mocked utils.model_format.HfApi, but after the
refactor to a re-export shim, detect_model_format lives in
llamafarm_common.model_format and uses its own HfApi reference. Fix mock
targets to patch at the source module.

The E2E source tests failed because UV_EXTRA_INDEX_URL and
UV_INDEX_STRATEGY leaked from the CI environment into server/rag
processes via os.Environ(). The PyTorch CPU index only has cp314 wheels
for markupsafe, causing install failures on Python 3.12. Strip these
vars from the base process environment so only services that explicitly
declare them (universal-runtime) receive them.
@rachmlenig rachmlenig marked this pull request as ready for review March 17, 2026 20:31
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add standalone edge runtime for Pi/Jetson deployment with GGUF inference and KV cache management

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Introduces a fully self-contained edge runtime for Pi/Jetson deployment with zero dependencies on
  runtimes/universal/
• Implements GGUF language model inference via llama-cpp with memory-optimized quantized model
  support for edge devices
• Adds multi-tier KV cache management (VRAM → RAM → disk) with segment-level validation for
  efficient prefix caching
• Provides OpenAI-compatible chat completions service with optional heavy utilities (context
  summarizer, history compressor, tool calling)
• Implements vision routers for detection, classification, and streaming with Hailo-10H accelerator
  support
• Adds comprehensive context management with multiple truncation strategies (sliding_window,
  keep_system, middle_out, summarize)
• Introduces thinking/reasoning model support with budget allocation and chain-of-thought utilities
• Consolidates device detection and model format utilities into common llamafarm_common package
  for code reuse
• Includes GPU allocation with VRAM estimation and SSRF-safe remote cascade support
• Provides GGML logging integration and metadata caching to optimize performance on constrained
  hardware
Diagram
flowchart LR
  A["Edge Runtime<br/>FastAPI Server"] --> B["GGUF Language<br/>Model"]
  A --> C["Vision Models<br/>Detection/Classification"]
  A --> D["KV Cache<br/>Manager"]
  B --> E["llama-cpp<br/>Inference"]
  B --> F["Context<br/>Calculator"]
  D --> G["Multi-tier Cache<br/>VRAM/RAM/Disk"]
  A --> H["Chat Completions<br/>Service"]
  H --> I["Optional Utils<br/>Summarizer/Compressor"]
  H --> J["Tool Calling<br/>& Thinking"]
  C --> K["Hailo-10H<br/>Accelerator"]
  L["Common Package"] -.->|Device Detection| A
  L -.->|Model Format| A
Loading

Grey Divider

File Changes

1. runtimes/edge/models/gguf_language_model.py ✨ Enhancement +1640/-0

GGUF language model wrapper with llama-cpp integration

• New 1640-line GGUF language model wrapper using llama-cpp for quantized model inference on edge
 devices
• Implements unified memory GPU detection for Jetson/Tegra platforms with synchronous inference
 optimization
• Provides chat completion, streaming, audio input, and tool calling support with native Jinja2
 template rendering
• Includes context management, token counting, KV cache support, and comprehensive error handling
 for memory-constrained devices

runtimes/edge/models/gguf_language_model.py


2. runtimes/edge/routers/chat_completions/service.py ✨ Enhancement +1402/-0

Chat completions service with optional utilities and KV cache

• New 1402-line chat completions service for edge runtime with optional heavy utilities (context
 summarizer, history compressor, tool calling)
• Implements streaming and non-streaming responses with incremental tool call detection via state
 machine
• Supports native audio input for multimodal models and STT transcription fallback
• Includes KV cache management, context validation/truncation, thinking budget allocation, and
 comprehensive logging

runtimes/edge/routers/chat_completions/service.py


3. runtimes/edge/utils/gguf_metadata_cache.py ✨ Enhancement +302/-0

Centralized GGUF metadata caching for performance

• New 302-line shared GGUF metadata cache to avoid redundant file reads (~4-5 seconds per read)
• Caches file size, context length, chat template, special tokens, and architecture parameters for
 KV cache estimation
• Thread-safe implementation with fallback for newer GGUF quantization types not yet supported by
 Python gguf library
• Provides cache statistics and selective cache clearing functionality

runtimes/edge/utils/gguf_metadata_cache.py


View more (62)
4. runtimes/edge/utils/kv_cache_manager.py ✨ Enhancement +703/-0

Multi-tier KV cache manager with segment validation

• Implements multi-tier KV cache management (VRAM → RAM → disk) with segment-level validation for
 multi-agent model sharing
• Provides cache entry lifecycle (prepare, lookup, restore, save_after_generation) with
 deduplication and TTL support
• Includes budget enforcement, garbage collection, and background cleanup task for memory management
• Supports partial cache hits when only part of conversation has changed (system prompt, tools, or
 history turns)

runtimes/edge/utils/kv_cache_manager.py


5. runtimes/edge/utils/context_calculator.py ✨ Enhancement +477/-0

Context size calculator with memory-aware optimization

• Computes optimal context window size based on available memory, model architecture, and user
 configuration
• Implements four-tier priority system: user config → model training context → pattern defaults →
 computed max
• Calculates exact KV cache bytes per token from GGUF metadata (n_layer, n_head_kv, head sizes)
• Provides lazy torch import for GGUF-only deployments without GPU dependencies

runtimes/edge/utils/context_calculator.py


6. runtimes/edge/utils/tool_calling.py ✨ Enhancement +555/-0

Prompt-based tool calling with XML tag parsing

• Implements prompt-based tool calling with XML tag injection and detection for model outputs
• Supports multiple tool_choice modes: auto, none, required, and specific function forcing
• Provides incremental streaming utilities to extract tool names and arguments from partial JSON
• Includes tool schema validation and comprehensive error handling for malformed tool calls

runtimes/edge/utils/tool_calling.py


7. runtimes/edge/utils/context_manager.py ✨ Enhancement +506/-0

Context window management with truncation strategies

• Manages context window with multiple truncation strategies: sliding_window, keep_system,
 middle_out, summarize
• Validates messages fit within context budget and applies truncation when needed
• Implements content truncation as fallback when message removal is insufficient
• Provides context usage tracking including truncation metadata for API responses

runtimes/edge/utils/context_manager.py


8. runtimes/edge/models/hailo_model.py ✨ Enhancement +357/-0

Hailo-10H YOLO detection model integration

• Implements YOLO detection model for Hailo-10H AI accelerator using pre-compiled .hef models
• Provides letterboxing preprocessing and NMS output parsing for bounding box extraction
• Supports model variant mapping (yolov8n, yolov11n, etc.) with fallback to VISION_MODELS_DIR
• Includes async load/unload and inference with thread pool execution to avoid event loop blocking

runtimes/edge/models/hailo_model.py


9. runtimes/edge/server.py ✨ Enhancement +437/-0

Edge runtime FastAPI server with hardware detection

• Minimal FastAPI server for on-device inference on constrained hardware (Raspberry Pi, Jetson)
• Implements hardware detection for Hailo-10H accelerator with CPU fallback for vision models
• Provides model lifecycle management with TTL-based unloading and background cleanup task
• Integrates KV cache manager, chat completions, health checks, and vision routers
 (detection/classification/streaming only)

runtimes/edge/server.py


10. runtimes/edge/routers/chat_completions/types.py ✨ Enhancement +307/-0

Chat completion types with audio and cache support

• Defines OpenAI-compatible chat completion request/response types with audio and tool calling
 support
• Includes audio content extraction and STT transcription fallback utilities
• Adds KV cache parameters (cache_key, return_cache_key) and context management options
 (auto_truncate, truncation_strategy)
• Provides thinking/reasoning model support with separate thinking content and token tracking

runtimes/edge/routers/chat_completions/types.py


11. runtimes/edge/routers/vision/__init__.py ✨ Enhancement +32/-0

Edge vision router aggregation (detection/classification only)

• Combines edge-specific vision routers (detection, classification, detect_classify, streaming)
• Excludes OCR, document extraction, training, evaluation, tracking, and sample data endpoints
• Exports loader setters and session cleanup functions for dependency injection

runtimes/edge/routers/vision/init.py


12. runtimes/universal/utils/model_format.py Refactoring +4/-152

Consolidate model format detection to common library

• Simplifies to re-export model format utilities from llamafarm_common as single source of truth
• Removes local caching and HuggingFace API logic previously duplicated in universal runtime
• Maintains backward compatibility by re-exporting GGUF utilities and quantization preferences

runtimes/universal/utils/model_format.py


13. runtimes/edge/utils/gpu_allocator.py ✨ Enhancement +349/-0

GPU allocation and VRAM estimation for edge runtime

• New GPU allocation module for multi-model, multi-GPU GGUF inference
• Implements single-GPU placement preference with multi-GPU fallback strategy
• Estimates VRAM requirements using model size, context window, and KV cache calculations
• Provides SSRF-safe remote cascade support with allowlist validation

runtimes/edge/utils/gpu_allocator.py


14. runtimes/edge/routers/vision/streaming.py ✨ Enhancement +385/-0

Streaming vision detection with cascade chain support

• New streaming vision router with cascade detection chain support
• Implements session management with TTL-based cleanup for orphaned streams
• Supports both local and remote model cascading with SSRF protection
• Provides endpoints for stream start/stop, frame processing, and session listing

runtimes/edge/routers/vision/streaming.py


15. runtimes/edge/utils/thinking.py ✨ Enhancement +268/-0

Thinking model utilities for chain-of-thought reasoning

• New utilities for parsing and controlling thinking/reasoning in models like Qwen3
• Implements parse_thinking_response() to extract <think> tags from model output
• Provides inject_thinking_control() to inject /think and /no_think soft switches
• Includes ThinkingBudgetProcessor logits processor to enforce token budget limits

runtimes/edge/utils/thinking.py


16. runtimes/edge/routers/cache.py ✨ Enhancement +243/-0

KV cache management API for prefix caching

• New KV cache API with prepare, validate, list, evict, stats, and GC endpoints
• Supports warm cache preparation with model loading and KV state pre-computation
• Implements cache validation without consuming the cache entry
• Provides cache statistics, garbage collection, and per-entry eviction

runtimes/edge/routers/cache.py


17. runtimes/edge/utils/context_summarizer.py ✨ Enhancement +239/-0

LLM-based conversation history summarization

• New context summarization module using LLM-based compression of conversation history
• Preserves recent messages while summarizing older ones to reduce token count
• Integrates with server's model caching mechanism for efficient summarization model loading
• Configurable model selection and number of recent exchanges to preserve

runtimes/edge/utils/context_summarizer.py


18. runtimes/edge/utils/history_compressor.py ✨ Enhancement +259/-0

Conversation history compression utilities

• New lossless and near-lossless compression techniques for conversation history
• Applies whitespace normalization, tool result truncation, code block compression
• Removes duplicate/near-duplicate content while preserving recent messages
• Reduces token usage before truncation without losing semantic meaning

runtimes/edge/utils/history_compressor.py


19. runtimes/edge/models/yolo_model.py ✨ Enhancement +175/-0

YOLO object detection model wrapper

• New YOLO object detection model wrapper supporting YOLOv8/v11 via ultralytics
• Implements detection, training, and export functionality with device auto-detection
• Includes path validation for security and pi_heif import error suppression
• Supports class filtering and confidence threshold customization

runtimes/edge/models/yolo_model.py


20. runtimes/edge/models/language_model.py ✨ Enhancement +222/-0

HuggingFace language model wrapper

• New language model wrapper for HuggingFace causal language models
• Implements both non-streaming and streaming text generation with chat templates
• Uses asyncio.to_thread() to avoid blocking the FastAPI event loop during loading
• Supports temperature, top-p sampling, and stop sequences

runtimes/edge/models/language_model.py


21. runtimes/edge/models/clip_model.py ✨ Enhancement +185/-0

CLIP image classification and embedding model

• New CLIP-based image classification and embedding model wrapper
• Implements zero-shot classification with pre-computed class embeddings
• Provides image and text embedding generation with caching of class embeddings
• Supports multiple CLIP variants (ViT-base, ViT-large, SigLIP)

runtimes/edge/models/clip_model.py


22. runtimes/edge/routers/vision/detect_classify.py ✨ Enhancement +180/-0

Combined detection and classification endpoint

• New detect+classify combo endpoint combining YOLO detection with CLIP classification
• Crops detected regions and classifies each crop in a single round-trip
• Includes minimum crop size filtering and RGB mode conversion for JPEG encoding
• Returns unified results with both detection and classification confidence scores

runtimes/edge/routers/vision/detect_classify.py


23. runtimes/universal/utils/device.py Refactoring +9/-195

Device utilities refactored to common package

• Refactored to re-export device utilities from llamafarm_common package
• Establishes single source of truth for device detection across runtimes
• Maintains backward compatibility with existing imports

runtimes/universal/utils/device.py


24. common/llamafarm_common/device.py ✨ Enhancement +195/-0

Common device detection and optimization utilities

• New common device detection module with lazy PyTorch loading
• Implements optimal device selection (CUDA, MPS, CPU) with environment variable overrides
• Provides detailed device info including per-GPU memory statistics
• Includes GGUF-specific GPU layer configuration independent of PyTorch

common/llamafarm_common/device.py


25. runtimes/edge/utils/ggml_logging.py ✨ Enhancement +184/-0

GGML/llama.cpp logging integration

• New GGML logging management module routing llama.cpp logs through Python logging
• Implements three modes: capture (default), suppress, and passthrough
• Handles log level mapping and buffers partial lines for complete message logging
• Downgrades known false-error messages to DEBUG level

runtimes/edge/utils/ggml_logging.py


26. runtimes/edge/models/vision_base.py ✨ Enhancement +188/-0

Base classes for vision models

• New base classes for vision models (detection and classification)
• Defines result dataclasses for detection, classification, and embeddings
• Provides image conversion utilities (bytes/numpy to PIL/numpy)
• Implements device resolution and model info retrieval

runtimes/edge/models/vision_base.py


27. runtimes/edge/models/base.py ✨ Enhancement +156/-0

Abstract base class for all model types

• New abstract base class for all HuggingFace models (transformers, diffusers)
• Implements common lifecycle methods (load, unload) with GPU cache clearing
• Provides dtype selection and tensor device movement utilities
• Includes platform-specific optimizations for MPS and CUDA

runtimes/edge/models/base.py


28. cli/cmd/orchestrator/python_env.go Additional files +21/-15

...

cli/cmd/orchestrator/python_env.go


29. cli/cmd/orchestrator/services.go Additional files +2/-2

...

cli/cmd/orchestrator/services.go


30. common/llamafarm_common/__init__.py Additional files +8/-0

...

common/llamafarm_common/init.py


31. common/llamafarm_common/model_cache.py Additional files +188/-0

...

common/llamafarm_common/model_cache.py


32. common/llamafarm_common/model_format.py Additional files +172/-0

...

common/llamafarm_common/model_format.py


33. common/llamafarm_common/pidfile.py Additional files +3/-7

...

common/llamafarm_common/pidfile.py


34. common/llamafarm_common/safe_home.py Additional files +34/-0

...

common/llamafarm_common/safe_home.py


35. common/pyproject.toml Additional files +1/-0

...

common/pyproject.toml


36. packages/llamafarm-llama/src/llamafarm_llama/_binary.py Additional files +40/-9

...

packages/llamafarm-llama/src/llamafarm_llama/_binary.py


37. runtimes/edge/Dockerfile Additional files +69/-0

...

runtimes/edge/Dockerfile


38. runtimes/edge/config/model_context_defaults.yaml Additional files +34/-0

...

runtimes/edge/config/model_context_defaults.yaml


39. runtimes/edge/core/__init__.py Additional files +0/-0

...

runtimes/edge/core/init.py


40. runtimes/edge/core/logging.py Additional files +156/-0

...

runtimes/edge/core/logging.py


41. runtimes/edge/models/__init__.py Additional files +45/-0

...

runtimes/edge/models/init.py


42. runtimes/edge/openapi.json Additional files +1/-0

...

runtimes/edge/openapi.json


43. runtimes/edge/project.json Additional files +31/-0

...

runtimes/edge/project.json


44. runtimes/edge/pyproject.toml Additional files +88/-0

...

runtimes/edge/pyproject.toml


45. runtimes/edge/routers/__init__.py Additional files +0/-0

...

runtimes/edge/routers/init.py


46. runtimes/edge/routers/chat_completions/__init__.py Additional files +3/-0

...

runtimes/edge/routers/chat_completions/init.py


47. runtimes/edge/routers/chat_completions/router.py Additional files +26/-0

...

runtimes/edge/routers/chat_completions/router.py


48. runtimes/edge/routers/health/__init__.py Additional files +5/-0

...

runtimes/edge/routers/health/init.py


49. runtimes/edge/routers/health/router.py Additional files +75/-0

...

runtimes/edge/routers/health/router.py


50. runtimes/edge/routers/vision/classification.py Additional files +61/-0

...

runtimes/edge/routers/vision/classification.py


51. runtimes/edge/routers/vision/detection.py Additional files +76/-0

...

runtimes/edge/routers/vision/detection.py


52. runtimes/edge/routers/vision/utils.py Additional files +22/-0

...

runtimes/edge/routers/vision/utils.py


53. runtimes/edge/services/__init__.py Additional files +0/-0

...

runtimes/edge/services/init.py


54. runtimes/edge/services/error_handler.py Additional files +143/-0

...

runtimes/edge/services/error_handler.py


55. runtimes/edge/utils/__init__.py Additional files +0/-0

...

runtimes/edge/utils/init.py


56. runtimes/edge/utils/device.py Additional files +9/-0

...

runtimes/edge/utils/device.py


57. runtimes/edge/utils/file_handler.py Additional files +213/-0

...

runtimes/edge/utils/file_handler.py


58. runtimes/edge/utils/jinja_tools.py Additional files +192/-0

...

runtimes/edge/utils/jinja_tools.py


59. runtimes/edge/utils/model_cache.py Additional files +4/-0

...

runtimes/edge/utils/model_cache.py


60. runtimes/edge/utils/model_format.py Additional files +24/-0

...

runtimes/edge/utils/model_format.py


61. runtimes/edge/utils/safe_home.py Additional files +4/-0

...

runtimes/edge/utils/safe_home.py


62. runtimes/edge/utils/token_counter.py Additional files +153/-0

...

runtimes/edge/utils/token_counter.py


63. runtimes/universal/tests/test_model_format.py Additional files +12/-12

...

runtimes/universal/tests/test_model_format.py


64. runtimes/universal/utils/model_cache.py Additional files +3/-187

...

runtimes/universal/utils/model_cache.py


65. runtimes/universal/utils/safe_home.py Additional files +3/-33

...

runtimes/universal/utils/safe_home.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Mar 17, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (3) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Undocumented UV_INDEX_STRATEGY env var 📘 Rule violation ⛯ Reliability
Description
The PR introduces UV_INDEX_STRATEGY as a configurable environment variable for
universal-runtime, but it is not documented in .env.example. This can lead to misconfiguration
(especially in CI) and violates the requirement to document new configuration keys.
Code

cli/cmd/orchestrator/services.go[R151-154]

"HF_TOKEN":                 "",
// In CI environments, use CPU-only PyTorch to avoid downloading 3GB+ of CUDA packages
-			"UV_EXTRA_INDEX_URL":  "${UV_EXTRA_INDEX_URL}",
-			"UV_INDEX_STRATEGY":   "", // Inherit from parent env (e.g. unsafe-best-match in CI)
+			"UV_EXTRA_INDEX_URL": "${UV_EXTRA_INDEX_URL}",
+			"UV_INDEX_STRATEGY":  "${UV_INDEX_STRATEGY}",
Evidence
UV_INDEX_STRATEGY is now read from the parent environment and passed into the universal-runtime
service, which makes it a user/CI-configurable key. .env.example contains no entry for
UV_INDEX_STRATEGY (or UV index settings), so the new key is not documented as required.

AGENTS.md
AGENTS.md
cli/cmd/orchestrator/services.go[149-155]
.env.example[43-55]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The PR adds a new configurable environment variable (`UV_INDEX_STRATEGY`) used by the orchestrator when launching `universal-runtime`, but `.env.example` was not updated to include it.
## Issue Context
Users/CI may need to set this variable for installs (e.g., CPU-only PyTorch index behavior). The repo compliance rules require new configuration keys to be documented in `.env.example` and kept consistent with configuration changes.
## Fix Focus Areas
- cli/cmd/orchestrator/services.go[149-155]
- .env.example[43-55]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. One-line stub function defs📘 Rule violation ✓ Correctness
Description
Several fallback stub functions are defined on a single line, which commonly violates Ruff/PEP8
rules (e.g., E701) and can fail repo lint checks. This reduces readability and consistency with
repository Python style conventions.
Code

runtimes/edge/routers/chat_completions/service.py[R60-67]

+    # No-op stubs — edge doesn't support tool calling
+    def detect_probable_tool_call(*a, **kw): return False  # type: ignore[misc]
+    def detect_tool_call_in_content(*a, **kw): return None  # type: ignore[misc]
+    def extract_arguments_progress(*a, **kw): return ""  # type: ignore[misc]
+    def extract_tool_name_from_partial(*a, **kw): return None  # type: ignore[misc]
+    def is_tool_call_complete(*a, **kw): return False  # type: ignore[misc]
+    def parse_tool_choice(*a, **kw): return None  # type: ignore[misc]
+    def strip_tool_call_from_content(*a, **kw): return a[0] if a else ""  # type: ignore[misc]
Evidence
Repository Python style requires Ruff-compatible formatting; single-line def ...: return ...
statements are typically flagged by Ruff/PEP8 (E701) and are inconsistent with the stated
conventions.

AGENTS.md
runtimes/edge/routers/chat_completions/service.py[58-67]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Single-line stub function definitions in the ImportError fallback are likely to violate Ruff/PEP8 (e.g., E701) and reduce readability.
## Issue Context
These stubs are used when optional tool-calling support is unavailable. They should still conform to repository Python style so CI linting passes.
## Fix Focus Areas
- runtimes/edge/routers/chat_completions/service.py[60-67]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. export() signature too long 📘 Rule violation ✓ Correctness
Description
The export() method signature exceeds the repository’s 88-character line length limit. This is
likely to trigger Ruff line-length checks and violates the repo’s Python style conventions.
Code

runtimes/edge/models/vision_base.py[R160-162]

+    async def export(self, format: Literal["onnx", "coreml", "tensorrt", "tflite", "openvino"],
+                     output_path: str, **kwargs) -> str:
+        raise NotImplementedError(f"{self.__class__.__name__} does not support export to {format}")
Evidence
The compliance checklist requires Python code to respect the 88-character line length; the
export() definition line is substantially longer than that limit.

AGENTS.md
runtimes/edge/models/vision_base.py[156-162]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `export()` method definition exceeds the repository’s 88-character line limit.
## Issue Context
Repo style conventions (Ruff/PEP8 variant) enforce line length, and long signatures should be wrapped for readability and lint compliance.
## Fix Focus Areas
- runtimes/edge/models/vision_base.py[160-162]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. openapi.json missing final newline 📘 Rule violation ✓ Correctness
Description
The newly added runtimes/edge/openapi.json is committed without a final newline. This violates
.editorconfig-style newline handling expectations and can cause formatting inconsistencies across
tools.
Code

runtimes/edge/openapi.json[1]

+{"openapi":"3.1.0","info":{"title":"LlamaFarm Edge Runtime","description":"Minimal on-device inference API for drones and edge hardware","version":"0.1.0"},"paths":{"/health":{"get":{"tags":["health"],"summary":"Health Check","description":"Health check endpoint with device information.","operationId":"health_check_health_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/v1/models":{"get":{"tags":["health"],"summary":"List Models","description":"List currently loaded models.","operationId":"list_models_v1_models_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/v1/chat/completions":{"post":{"summary":"Chat Completions","description":"OpenAI-compatible chat completions endpoint.\n\nSupports any HuggingFace causal language model.","operationId":"chat_completions_v1_chat_completions_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ChatCompletionRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/detect":{"post":{"tags":["vision","vision-detection"],"summary":"Detect Objects","description":"Detect objects in an image using YOLO.","operationId":"detect_objects_v1_vision_detect_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/DetectRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/DetectResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/classify":{"post":{"tags":["vision","vision-classification"],"summary":"Classify Image","description":"Classify an image using CLIP (zero-shot).","operationId":"classify_image_v1_vision_classify_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ClassifyRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ClassifyResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/detect_classify":{"post":{"tags":["vision","vision-detect-classify"],"summary":"Detect And Classify","description":"Detect objects then classify each crop — single round-trip.\n\nRuns YOLO detection → crops each bounding box → CLIP classifies each crop.\nReturns unified results with both detection and classification info.","operationId":"detect_and_classify_v1_vision_detect_classify_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/DetectClassifyRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/DetectClassifyResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/stream/start":{"post":{"tags":["vision","vision-streaming"],"summary":"Start Stream","description":"Start a streaming detection session with cascade config.","operationId":"start_stream_v1_vision_stream_start_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamStartRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamStartResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/stream/frame":{"post":{"tags":["vision","vision-streaming"],"summary":"Process Frame","description":"Process a frame through the cascade chain.","operationId":"process_frame_v1_vision_stream_frame_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamFrameRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamFrameResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/stream/stop":{"post":{"tags":["vision","vision-streaming"],"summary":"Stop Stream","description":"Stop a streaming session.","operationId":"stop_stream_v1_vision_stream_stop_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamStopRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/StreamStopResponse"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/vision/stream/sessions":{"get":{"tags":["vision","vision-streaming"],"summary":"List Sessions","description":"List active streaming sessions.","operationId":"list_sessions_v1_vision_stream_sessions_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/SessionsListResponse"}}}}}}},"/v1/models/unload":{"post":{"tags":["models"],"summary":"Unload All Models","description":"Unload all loaded models to free memory.","operationId":"unload_all_models_v1_models_unload_post","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}}},"components":{"schemas":{"Audio":{"properties":{"id":{"type":"string","title":"Id"}},"type":"object","required":["id"],"title":"Audio","description":"Data about a previous audio response from the model.\n[Learn more](https://platform.openai.com/docs/guides/audio)."},"BoundingBox":{"properties":{"x1":{"type":"number","title":"X1"},"y1":{"type":"number","title":"Y1"},"x2":{"type":"number","title":"X2"},"y2":{"type":"number","title":"Y2"}},"type":"object","required":["x1","y1","x2","y2"],"title":"BoundingBox"},"CascadeConfigRequest":{"properties":{"chain":{"items":{"type":"string"},"type":"array","title":"Chain","description":"Model chain, can include 'remote:http://...'","default":["yolov8n"]},"confidence_threshold":{"type":"number","maximum":1.0,"minimum":0.0,"title":"Confidence Threshold","default":0.7}},"type":"object","title":"CascadeConfigRequest"},"ChatCompletionAssistantMessageParam":{"properties":{"role":{"type":"string","const":"assistant","title":"Role"},"audio":{"anyOf":[{"$ref":"#/components/schemas/Audio"},{"type":"null"}]},"content":{"anyOf":[{"type":"string"},{"items":{"anyOf":[{"$ref":"#/components/schemas/ChatCompletionContentPartTextParam"},{"$ref":"#/components/schemas/ChatCompletionContentPartRefusalParam"}]},"type":"array"},{"type":"null"}],"title":"Content"},"function_call":{"anyOf":[{"$ref":"#/components/schemas/FunctionCall"},{"type":"null"}]},"name":{"type":"string","title":"Name"},"refusal":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Refusal"},"tool_calls":{"items":{"anyOf":[{"$ref":"#/components/schemas/ChatCompletionMessageFunctionToolCallParam"},{"$ref":"#/components/schemas/ChatCompletionMessageCustomToolCallParam"}]},"type":"array","title":"Tool Calls"}},"type":"object","required":["role"],"title":"ChatCompletionAssistantMessageParam","description":"Messages sent by the model in response to user messages."},"ChatCompletionContentPartImageParam":{"properties":{"image_url":{"$ref":"#/components/schemas/ImageURL"},"type":{"type":"string","const":"image_url","title":"Type"}},"type":"object","required":["image_url","type"],"title":"ChatCompletionContentPartImageParam","description":"Learn about [image inputs](https://platform.openai.com/docs/guides/vision)."},"ChatCompletionContentPartInputAudioParam":{"properties":{"input_audio":{"$ref":"#/components/schemas/InputAudio"},"type":{"type":"string","const":"input_audio","title":"Type"}},"type":"object","required":["input_audio","type"],"title":"ChatCompletionContentPartInputAudioParam","description":"Learn about [audio inputs](https://platform.openai.com/docs/guides/audio)."},"ChatCompletionContentPartRefusalParam":{"properties":{"refusal":{"type":"string","title":"Refusal"},"type":{"type":"string","const":"refusal","title":"Type"}},"type":"object","required":["refusal","type"],"title":"ChatCompletionContentPartRefusalParam"},"ChatCompletionContentPartTextParam":{"properties":{"text":{"type":"string","title":"Text"},"type":{"type":"string","const":"text","title":"Type"}},"type":"object","required":["text","type"],"title":"ChatCompletionContentPartTextParam","description":"Learn about [text inputs](https://platform.openai.com/docs/guides/text-generation)."},"ChatCompletionDeveloperMessageParam":{"properties":{"content":{"anyOf":[{"type":"string"},{"items":{"$ref":"#/components/schemas/ChatCompletionContentPartTextParam"},"type":"array"}],"title":"Content"},"role":{"type":"string","const":"developer","title":"Role"},"name":{"type":"string","title":"Name"}},"type":"object","required":["content","role"],"title":"ChatCompletionDeveloperMessageParam","description":"Developer-provided instructions that the model should follow, regardless of\nmessages sent by the user. With o1 models and newer, `developer` messages\nreplace the previous `system` messages."},"ChatCompletionFunctionMessageParam":{"properties":{"content":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Content"},"name":{"type":"string","title":"Name"},"role":{"type":"string","const":"function","title":"Role"}},"type":"object","required":["content","name","role"],"title":"ChatCompletionFunctionMessageParam"},"ChatCompletionFunctionToolParam":{"properties":{"function":{"$ref":"#/components/schemas/FunctionDefinition"},"type":{"type":"string","const":"function","title":"Type"}},"type":"object","required":["function","type"],"title":"ChatCompletionFunctionToolParam","description":"A function tool that can be used to generate a response."},"ChatCompletionMessageCustomToolCallParam":{"properties":{"id":{"type":"string","title":"Id"},"custom":{"$ref":"#/components/schemas/Custom"},"type":{"type":"string","const":"custom","title":"Type"}},"type":"object","required":["id","custom","type"],"title":"ChatCompletionMessageCustomToolCallParam","description":"A call to a custom tool created by the model."},"ChatCompletionMessageFunctionToolCallParam":{"properties":{"id":{"type":"string","title":"Id"},"function":{"$ref":"#/components/schemas/Function"},"type":{"type":"string","const":"function","title":"Type"}},"type":"object","required":["id","function","type"],"title":"ChatCompletionMessageFunctionToolCallParam","description":"A call to a function tool created by the model."},"ChatCompletionRequest":{"properties":{"model":{"type":"string","title":"Model"},"messages":{"items":{"anyOf":[{"$ref":"#/components/schemas/ChatCompletionDeveloperMessageParam"},{"$ref":"#/components/schemas/ChatCompletionSystemMessageParam"},{"$ref":"#/components/schemas/ChatCompletionUserMessageParam"},{"$ref":"#/components/schemas/ChatCompletionAssistantMessageParam"},{"$ref":"#/components/schemas/ChatCompletionToolMessageParam"},{"$ref":"#/components/schemas/ChatCompletionFunctionMessageParam"}]},"type":"array","title":"Messages"},"temperature":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Temperature","default":1.0},"top_p":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Top P","default":1.0},"max_tokens":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Max Tokens"},"stream":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Stream","default":false},"stop":{"anyOf":[{"type":"string"},{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Stop"},"logprobs":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Logprobs"},"top_logprobs":{"anyOf":[{"type":"integer","maximum":20.0,"minimum":0.0},{"type":"null"}],"title":"Top Logprobs"},"presence_penalty":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Presence Penalty","default":0.0},"frequency_penalty":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Frequency Penalty","default":0.0},"user":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"User"},"n_ctx":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"N Ctx"},"n_batch":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"N Batch"},"n_gpu_layers":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"N Gpu Layers"},"n_threads":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"N Threads"},"flash_attn":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Flash Attn"},"use_mmap":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Use Mmap"},"use_mlock":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Use Mlock"},"cache_type_k":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Cache Type K"},"cache_type_v":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Cache Type V"},"extra_body":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Extra Body"},"tools":{"anyOf":[{"items":{"$ref":"#/components/schemas/ChatCompletionFunctionToolParam"},"type":"array"},{"type":"null"}],"title":"Tools"},"tool_choice":{"anyOf":[{"type":"string"},{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Tool Choice"},"think":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Think"},"thinking_budget":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Thinking Budget"},"cache_key":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Cache Key"},"return_cache_key":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Return Cache Key"},"auto_truncate":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Auto Truncate","default":true},"truncation_strategy":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Truncation Strategy"}},"type":"object","required":["model","messages"],"title":"ChatCompletionRequest","description":"OpenAI-compatible chat completion request."},"ChatCompletionSystemMessageParam":{"properties":{"content":{"anyOf":[{"type":"string"},{"items":{"$ref":"#/components/schemas/ChatCompletionContentPartTextParam"},"type":"array"}],"title":"Content"},"role":{"type":"string","const":"system","title":"Role"},"name":{"type":"string","title":"Name"}},"type":"object","required":["content","role"],"title":"ChatCompletionSystemMessageParam","description":"Developer-provided instructions that the model should follow, regardless of\nmessages sent by the user. With o1 models and newer, use `developer` messages\nfor this purpose instead."},"ChatCompletionToolMessageParam":{"properties":{"content":{"anyOf":[{"type":"string"},{"items":{"$ref":"#/components/schemas/ChatCompletionContentPartTextParam"},"type":"array"}],"title":"Content"},"role":{"type":"string","const":"tool","title":"Role"},"tool_call_id":{"type":"string","title":"Tool Call Id"}},"type":"object","required":["content","role","tool_call_id"],"title":"ChatCompletionToolMessageParam"},"ChatCompletionUserMessageParam":{"properties":{"content":{"anyOf":[{"type":"string"},{"items":{"anyOf":[{"$ref":"#/components/schemas/ChatCompletionContentPartTextParam"},{"$ref":"#/components/schemas/ChatCompletionContentPartImageParam"},{"$ref":"#/components/schemas/ChatCompletionContentPartInputAudioParam"},{"$ref":"#/components/schemas/File"}]},"type":"array"}],"title":"Content"},"role":{"type":"string","const":"user","title":"Role"},"name":{"type":"string","title":"Name"}},"type":"object","required":["content","role"],"title":"ChatCompletionUserMessageParam","description":"Messages sent by an end user, containing prompts or additional context\ninformation."},"ClassifiedDetection":{"properties":{"box":{"$ref":"#/components/schemas/BoundingBox"},"detection_class":{"type":"string","title":"Detection Class"},"detection_confidence":{"type":"number","title":"Detection Confidence"},"classification":{"type":"string","title":"Classification"},"classification_confidence":{"type":"number","title":"Classification Confidence"},"all_scores":{"additionalProperties":{"type":"number"},"type":"object","title":"All Scores"}},"type":"object","required":["box","detection_class","detection_confidence","classification","classification_confidence","all_scores"],"title":"ClassifiedDetection","description":"A detection with classification results."},"ClassifyRequest":{"properties":{"image":{"type":"string","title":"Image","description":"Base64-encoded image"},"model":{"type":"string","title":"Model","default":"clip-vit-base"},"classes":{"items":{"type":"string"},"type":"array","title":"Classes","description":"Classes for zero-shot classification"},"top_k":{"type":"integer","maximum":100.0,"minimum":1.0,"title":"Top K","default":5}},"type":"object","required":["image","classes"],"title":"ClassifyRequest"},"ClassifyResponse":{"properties":{"class_name":{"type":"string","title":"Class Name"},"class_id":{"type":"integer","title":"Class Id"},"confidence":{"type":"number","title":"Confidence"},"all_scores":{"additionalProperties":{"type":"number"},"type":"object","title":"All Scores"},"model":{"type":"string","title":"Model"},"inference_time_ms":{"type":"number","title":"Inference Time Ms"}},"type":"object","required":["class_name","class_id","confidence","all_scores","model","inference_time_ms"],"title":"ClassifyResponse"},"Custom":{"properties":{"input":{"type":"string","title":"Input"},"name":{"type":"string","title":"Name"}},"type":"object","required":["input","name"],"title":"Custom","description":"The custom tool that the model called."},"DetectClassifyRequest":{"properties":{"image":{"type":"string","title":"Image","description":"Base64-encoded image"},"detection_model":{"type":"string","title":"Detection Model","description":"YOLO model for detection","default":"yolov8n"},"classification_model":{"type":"string","title":"Classification Model","description":"CLIP model for classification","default":"clip-vit-base"},"classes":{"items":{"type":"string"},"type":"array","title":"Classes","description":"Classes for zero-shot classification of each crop"},"confidence_threshold":{"type":"number","maximum":1.0,"minimum":0.0,"title":"Confidence Threshold","description":"Detection confidence threshold","default":0.5},"detection_classes":{"anyOf":[{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Detection Classes","description":"Filter detections to these YOLO classes"},"top_k":{"type":"integer","maximum":100.0,"minimum":1.0,"title":"Top K","description":"Top-K classification results per crop","default":3},"min_crop_px":{"type":"integer","minimum":1.0,"title":"Min Crop Px","description":"Minimum crop dimension in pixels (skip tiny detections)","default":16}},"type":"object","required":["image","classes"],"title":"DetectClassifyRequest"},"DetectClassifyResponse":{"properties":{"results":{"items":{"$ref":"#/components/schemas/ClassifiedDetection"},"type":"array","title":"Results"},"total_detections":{"type":"integer","title":"Total Detections"},"classified_count":{"type":"integer","title":"Classified Count"},"detection_model":{"type":"string","title":"Detection Model"},"classification_model":{"type":"string","title":"Classification Model"},"detection_time_ms":{"type":"number","title":"Detection Time Ms"},"classification_time_ms":{"type":"number","title":"Classification Time Ms"},"total_time_ms":{"type":"number","title":"Total Time Ms"}},"type":"object","required":["results","total_detections","classified_count","detection_model","classification_model","detection_time_ms","classification_time_ms","total_time_ms"],"title":"DetectClassifyResponse"},"DetectRequest":{"properties":{"image":{"type":"string","title":"Image","description":"Base64-encoded image"},"model":{"type":"string","title":"Model","default":"yolov8n"},"confidence_threshold":{"type":"number","maximum":1.0,"minimum":0.0,"title":"Confidence Threshold","default":0.5},"classes":{"anyOf":[{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Classes"}},"type":"object","required":["image"],"title":"DetectRequest"},"DetectResponse":{"properties":{"detections":{"items":{"$ref":"#/components/schemas/Detection"},"type":"array","title":"Detections"},"model":{"type":"string","title":"Model"},"inference_time_ms":{"type":"number","title":"Inference Time Ms"}},"type":"object","required":["detections","model","inference_time_ms"],"title":"DetectResponse"},"Detection":{"properties":{"box":{"$ref":"#/components/schemas/BoundingBox"},"class_name":{"type":"string","title":"Class Name"},"class_id":{"type":"integer","title":"Class Id"},"confidence":{"type":"number","title":"Confidence"}},"type":"object","required":["box","class_name","class_id","confidence"],"title":"Detection"},"DetectionItem":{"properties":{"x1":{"type":"number","title":"X1"},"y1":{"type":"number","title":"Y1"},"x2":{"type":"number","title":"X2"},"y2":{"type":"number","title":"Y2"},"class_name":{"type":"string","title":"Class Name"},"class_id":{"type":"integer","title":"Class Id"},"confidence":{"type":"number","title":"Confidence"}},"type":"object","required":["x1","y1","x2","y2","class_name","class_id","confidence"],"title":"DetectionItem"},"File":{"properties":{"file":{"$ref":"#/components/schemas/FileFile"},"type":{"type":"string","const":"file","title":"Type"}},"type":"object","required":["file","type"],"title":"File","description":"Learn about [file inputs](https://platform.openai.com/docs/guides/text) for text generation."},"FileFile":{"properties":{"file_data":{"type":"string","title":"File Data"},"file_id":{"type":"string","title":"File Id"},"filename":{"type":"string","title":"Filename"}},"type":"object","title":"FileFile"},"Function":{"properties":{"arguments":{"type":"string","title":"Arguments"},"name":{"type":"string","title":"Name"}},"type":"object","required":["arguments","name"],"title":"Function","description":"The function that the model called."},"FunctionCall":{"properties":{"arguments":{"type":"string","title":"Arguments"},"name":{"type":"string","title":"Name"}},"type":"object","required":["arguments","name"],"title":"FunctionCall","description":"Deprecated and replaced by `tool_calls`.\n\nThe name and arguments of a function that should be called, as generated by the model."},"FunctionDefinition":{"properties":{"name":{"type":"string","title":"Name"},"description":{"type":"string","title":"Description"},"parameters":{"additionalProperties":true,"type":"object","title":"Parameters"},"strict":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Strict"}},"type":"object","required":["name"],"title":"FunctionDefinition"},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"ImageURL":{"properties":{"url":{"type":"string","title":"Url"},"detail":{"type":"string","enum":["auto","low","high"],"title":"Detail"}},"type":"object","required":["url"],"title":"ImageURL"},"InputAudio":{"properties":{"data":{"type":"string","title":"Data"},"format":{"type":"string","enum":["wav","mp3"],"title":"Format"}},"type":"object","required":["data","format"],"title":"InputAudio"},"SessionInfo":{"properties":{"session_id":{"type":"string","title":"Session Id"},"frames_processed":{"type":"integer","title":"Frames Processed"},"actions_triggered":{"type":"integer","title":"Actions Triggered"},"escalations":{"type":"integer","title":"Escalations"},"chain":{"items":{"type":"string"},"type":"array","title":"Chain"},"idle_seconds":{"type":"number","title":"Idle Seconds"},"duration_seconds":{"type":"number","title":"Duration Seconds"}},"type":"object","required":["session_id","frames_processed","actions_triggered","escalations","chain","idle_seconds","duration_seconds"],"title":"SessionInfo"},"SessionsListResponse":{"properties":{"sessions":{"items":{"$ref":"#/components/schemas/SessionInfo"},"type":"array","title":"Sessions"},"count":{"type":"integer","title":"Count"}},"type":"object","required":["sessions","count"],"title":"SessionsListResponse"},"StreamFrameRequest":{"properties":{"session_id":{"type":"string","title":"Session Id"},"image":{"type":"string","title":"Image","description":"Base64-encoded image"}},"type":"object","required":["session_id","image"],"title":"StreamFrameRequest"},"StreamFrameResponse":{"properties":{"status":{"type":"string","title":"Status"},"detections":{"anyOf":[{"items":{"$ref":"#/components/schemas/DetectionItem"},"type":"array"},{"type":"null"}],"title":"Detections"},"confidence":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Confidence"},"resolved_by":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Resolved By"}},"type":"object","required":["status"],"title":"StreamFrameResponse"},"StreamStartRequest":{"properties":{"config":{"$ref":"#/components/schemas/CascadeConfigRequest"},"target_fps":{"type":"number","title":"Target Fps","default":1.0},"action_classes":{"anyOf":[{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Action Classes"},"cooldown_seconds":{"type":"number","title":"Cooldown Seconds","default":5.0}},"type":"object","title":"StreamStartRequest"},"StreamStartResponse":{"properties":{"session_id":{"type":"string","title":"Session Id"}},"type":"object","required":["session_id"],"title":"StreamStartResponse"},"StreamStopRequest":{"properties":{"session_id":{"type":"string","title":"Session Id"}},"type":"object","required":["session_id"],"title":"StreamStopRequest"},"StreamStopResponse":{"properties":{"session_id":{"type":"string","title":"Session Id"},"frames_processed":{"type":"integer","title":"Frames Processed"},"actions_triggered":{"type":"integer","title":"Actions Triggered"},"escalations":{"type":"integer","title":"Escalations"},"duration_seconds":{"type":"number","title":"Duration Seconds"}},"type":"object","required":["session_id","frames_processed","actions_triggered","escalations","duration_seconds"],"title":"StreamStopResponse"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"},"input":{"title":"Input"},"ctx":{"type":"object","title":"Context"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"}}}}
Evidence
The diff explicitly indicates openapi.json has no newline at end of file, which is a standard
.editorconfig formatting requirement for consistent newline handling.

AGENTS.md
pr_files_diffs/runtimes_edge_openapi_json.patch[8-8]
runtimes/edge/openapi.json[1-1]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`runtimes/edge/openapi.json` is missing a final newline, violating `.editorconfig` newline handling conventions.
## Issue Context
Many tooling chains and style checks expect a final newline to avoid noisy diffs and ensure consistent file formatting.
## Fix Focus Areas
- runtimes/edge/openapi.json[1-1]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. ARM64 tag not validated🐞 Bug ⛯ Reliability
Description
_get_llamafarm_release_version() returns the GitHub “latest” release tag_name without validating
that the expected ARM64 binary asset exists. download_binary() then builds the ARM64 download URL
from that tag and raises RuntimeError on HTTP error, which can break ARM64 installs whenever the
latest release is missing the artifact.
Code

packages/llamafarm-llama/src/llamafarm_llama/_binary.py[R64-77]

+    # 2. Query GitHub API for latest release
try:
-        version = metadata.version("llamafarm-llama")
-        if version and not version.startswith("0.0.0"):
-            return f"v{version}"
-    except metadata.PackageNotFoundError:
-        pass
-    # Fallback for dev installs
-    return "v0.0.1"
+        import json
+
+        req = Request(
+            "https://api.github.com/repos/llama-farm/llamafarm/releases/latest",
+            headers={"User-Agent": "llamafarm-llama", "Accept": "application/vnd.github.v3+json"},
+        )
+        with urlopen(req, timeout=10) as response:
+            data = json.loads(response.read())
+            tag = data.get("tag_name")
+            if tag:
+                logger.info(f"Using latest LlamaFarm release: {tag}")
+                return tag
Evidence
The version-selection function returns the latest release tag without checking assets, and the
downloader uses that tag to construct the ARM64 URL; any missing asset results in a hard failure
during download.

packages/llamafarm-llama/src/llamafarm_llama/_binary.py[44-84]
packages/llamafarm-llama/src/llamafarm_llama/_binary.py[562-627]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
ARM64 binary downloads can fail because `_get_llamafarm_release_version()` returns the GitHub `releases/latest` tag without verifying that the ARM64 artifact exists for that release. `download_binary()` then constructs the ARM64 URL using that tag and raises when the asset is missing.
### Issue Context
The code comment/docstring states the function queries for the “latest release with the ARM64 binary”, but the implementation only reads `tag_name` and never checks `assets` or performs a lightweight existence check on the expected URL.
### Fix Focus Areas
- packages/llamafarm-llama/src/llamafarm_llama/_binary.py[44-84]
- packages/llamafarm-llama/src/llamafarm_llama/_binary.py[586-627]
### Suggested implementation direction
- Fetch `releases/latest` JSON and scan `assets` for the expected filename (`llama-{version}-bin-linux-arm64.zip`) OR
- Perform a `HEAD`/`GET` probe against the constructed artifact URL and only return the tag if it succeeds.
- If validation fails, fall back to the hardcoded `fallback` (or iterate through the most recent N releases until a match is found).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Tool stub unpack crash🐞 Bug ✓ Correctness
Description
In edge chat completions, the ImportError fallback for utils.tool_calling defines
parse_tool_choice() to return None, but the streaming/non-streaming paths unconditionally unpack its
return into two variables. If the optional import fails, requests will crash with a TypeError during
tool_choice parsing.
Code

runtimes/edge/routers/chat_completions/service.py[R59-67]

+except ImportError:
+    # No-op stubs — edge doesn't support tool calling
+    def detect_probable_tool_call(*a, **kw): return False  # type: ignore[misc]
+    def detect_tool_call_in_content(*a, **kw): return None  # type: ignore[misc]
+    def extract_arguments_progress(*a, **kw): return ""  # type: ignore[misc]
+    def extract_tool_name_from_partial(*a, **kw): return None  # type: ignore[misc]
+    def is_tool_call_complete(*a, **kw): return False  # type: ignore[misc]
+    def parse_tool_choice(*a, **kw): return None  # type: ignore[misc]
+    def strip_tool_call_from_content(*a, **kw): return a[0] if a else ""  # type: ignore[misc]
Evidence
The ImportError branch defines a stub that returns None, while call sites always expect a 2-tuple
(mode, function_name). The real implementation returns a tuple, so the stub is incompatible and will
crash when that fallback is used.

runtimes/edge/routers/chat_completions/service.py[49-67]
runtimes/edge/routers/chat_completions/service.py[802-805]
runtimes/edge/utils/tool_calling.py[136-156]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
If `utils.tool_calling` fails to import, `parse_tool_choice` is stubbed to return `None`, but the service always unpacks its return value into `(tool_choice_mode, _)`, which will raise `TypeError: cannot unpack non-iterable NoneType object`.
### Issue Context
Even if tool calling is intended to be optional on edge, the fallback implementation must remain interface-compatible or the code must short-circuit tool logic when tool-calling support is unavailable.
### Fix Focus Areas
- runtimes/edge/routers/chat_completions/service.py[49-67]
- runtimes/edge/routers/chat_completions/service.py[802-806]
### Suggested implementation direction
- In the ImportError branch, define `parse_tool_choice` to return a valid default tuple like `(&quot;none&quot;, None)` (or `(&quot;auto&quot;, None)` depending on desired behavior).
- Consider introducing a module-level flag like `TOOL_CALLING_AVAILABLE = True/False` and set `should_detect_tools = False` when unavailable, to avoid relying on multiple stubs staying perfectly compatible over time.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

cubic-dev-ai[bot]

This comment was marked as outdated.

lspci -d 1e60: already filters by vendor ID, but the old check looked
for "1e60" in the output text. lspci resolves vendor IDs to names, so
the output contains "Hailo Technologies Ltd." instead of "1e60",
causing detection to always fail. Check for any non-empty output instead.
ConfiguredInferModel doesn't expose .output() in HailoRT 5.2.0.
Use self._infer_model.output().shape instead of
self._configured.output().shape to get the output buffer dimensions.
ConfiguredInferModel.run() does not accept timeout_ms in HailoRT 5.2.0.
HailoRT 5.2.0 accepts timeout as a positional arg, not keyword.
… bots

- Log warning on failed CPU offload instead of silent pass (base.py)
- Log debug on unified memory detection failure instead of silent pass
  (gguf_language_model.py)
- Remove unused _timing_start variable (gguf_language_model.py)
- Add path traversal validation in load_language() (server.py)
- Add path traversal validation in _read_gguf_metadata() (gguf_metadata_cache.py)
pydantic/ollama-action@v3 downloads a .tgz archive but Ollama v0.18.3
switched to .tar.zst, causing a 404. Use the official install script
which handles format changes internally.
cubic-dev-ai[bot]

This comment was marked as outdated.

Download the install script from a pinned release tag and verify its
SHA-256 before executing, addressing supply-chain risk from the
unpinned curl|sh approach.
Replace bare pass with debug-level logging for BOS/EOS token decode
failures. Behavior is unchanged but intent is documented and failures
are observable with debug logging enabled.
Ollama v0.19.0 ships .tar.zst archives. The install script falls back
to .tgz when zstd is missing, but .tgz no longer exists (404).
Replace bare pass with debug log when apply_chat_template fails,
improving observability while preserving the fallback behavior.
Replace bare pass with debug log in the per-token think-tag detection
hook. Keeps suppression to avoid breaking generation but makes failures
observable.
Replace empty except with debug log when parsing n_ctx_train from
GGUF metadata fields fails, consistent with other debug logging in
this module.
Replace empty except with debug log when /proc/meminfo is unavailable,
documenting the fallback to psutil.
The path traversal guard rejected all absolute paths, even when they
resolved within ~/.llamafarm or cwd. Keep the .. rejection but let
absolute paths through to the containment check.
The done-callback raises ValueError if stop() already cleared
_pending_futures before the future completes.
Block Windows drive-style paths like C:/... that bypass the existing
validation by containing a forward slash.
The log_file argument was mistakenly removed. setup_logging does accept
it, so LOG_FILE env var was silently ignored.
The concurrency refactor stopped updating self.class_names, causing
get_model_info to report num_classes as 0.
Use functools.lru_cache to memoize _detect_hailo() instead of a
module-level global variable, removing the unused global warning.
The install script appends ?version=v0.19.0 to download URLs when
OLLAMA_VERSION is set, which causes the .tar.zst HEAD check to fail
and falls back to .tgz (404). The pinned install script from the
release provides sufficient version control.
@rachmlenig rachmlenig added component::runtime merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits. labels Mar 30, 2026
@mhamann mhamann merged commit 9920067 into main Mar 30, 2026
78 of 82 checks passed
@mhamann mhamann deleted the feat-runtime-edge-standalone branch March 30, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component::runtime merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants